SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
نویسندگان
چکیده
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success the natural language field demonstrated its adaptation to speech. However, previous works on have not incorporated properties of speech, leaving full potential unexplored. In this paper, we consider characteristics propose a general structure-based framework, called SpeechFormer++, for paralinguistic processing. More concretely, following component relationship signal, design unit encoder model intra- inter-unit information (i.e., frames, phones, words) efficiently. According hierarchical relationship, utilize merging blocks generate features at different granularities, which consistent with structural pattern signal. Moreover, word introduced integrate word-grained into each encoder, effectively balances fine-grained coarse-grained information. SpeechFormer++ evaluated emotion recognition (IEMOCAP & MELD), depression classification (DAIC-WOZ) Alzheimer's disease detection (Pitt) tasks. The results show that outperforms standard while greatly reducing computational cost. Furthermore, it delivers superior compared state-of-the-art approaches.
منابع مشابه
Paralinguistic elements in speech synthesis
Corpus based text-to-speech systems currently produce very natural synthetic sentences, though limited to a neutral inexpressive speaking style. Paralinguistic elements are some of the expressive features one would most like to introduce. In this paper, we describe a new method for introducing laughter and hesitation in synthetic speech. Thanks to a small dedicated acoustic database, this metho...
متن کاملEfficient processing of hierarchical graphs
Efficient processing of hierarchical graphs " (1990). Retrospective Theses and Dissertations. Paper 9385. The most advanced technology has been used to photograph and reproduce this manuscript from the microfihn master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer ...
متن کاملA Hierarchical Framework for Efficient Multilevel Visual Exploration and Analysis
The purpose of data visualization is to offer intuitive ways for information perception and manipulation, especially for non-expert users. Most traditional visualization tools and methods operate on an offline way, limited on accessing static (preprocessed) sets of data. They also restrict themselves on dealing with small dataset sizes, which can be easily visually analysed with conventional vi...
متن کاملAn Efficient Curvelet Framework for Denoising Images
Wiener filter suppresses noise efficiently. However, it makes the out image blurred. Curvelet preserves the edges of natural images perfectly, but, it produces visual distortion artifacts and fuzzy edges to the restored image, especially in homogeneous regions of images. In this paper, a new image denoising framework based on Curvelet transform and wiener filter is proposed, which can stop nois...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2023
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2023.3235194